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Abstract 

In this paper, global optimization (GO) Lipschitz problems are consid- 
ered where the multi-dimensional multiextremal objective function is deter- 
mined over a hyperinterval. An efficient one-dimensional GO method using 
local tuning on the behavior of the objective function is generalized to the 
multi-dimensional case by the diagonal approach using two partition strate- 
gies. Global convergence conditions are established for the obtained diagonal 
geometric methods. Results of a wide numerical comparison show a strong 
acceleration reached by the new methods working with estimates of the local 
Lipschitz constants over different subregions of the search domain in com- 
parison with the traditional approach. 

Key Words: Global optimization - diagonal approach - local tuning - partition 
strategies. 

1 Introduction 

In lfT3l [T4l [161 diagonal global optimization algorithms have been introduced for 
solving multi-dimensional Lipschitz global optimization (GO) problems with box 
constraints. In its general form such a problem can be stated as minimization of a 
multiextremal function satisfying the Lipschitz condition with a constant < L < 
oo over a hyperinterval, i.e., finding the value /* and points x* such that 

r = f{x*)=mmf{x), (1) 
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where 

\f{x')-f{x")\<L\\x'-x"\\, x',x"€DcW, (2) 
D = [a,b] = {x eW' : a < X <b}, a < b, a,b eM."-. (3) 

Such problems very often can be faced in real-life applications (for example, 
in data classification, nonlinear approximation, globally optimized calibration of 
complex system models etc.). A number of such problems solved by the diagonal 
methods can be found in ifTSI. 

The diagonal approach is a simple and powerful tool for extending one-dimensional 
global optimization methods to the multi-dimensional case. The main idea is to 
describe the behavior of the objective function f{x) over a hyperinterval (we shall 
also use the term cell or simply interval) Di = [ai,bi\ by information obtained 
from evaluating f{x) at the vertices aj, 6j being the ends of the main diago- 
nal defining the interval Di. During every (/ + l)-th iteration to each subinter- 
val Di C D generated in the course of the previous / iterations a characteristic 
Ri = R{ai,bi, f{ai), f{bi)) is associated in such a way that Ri tends to be higher 
if Di contains lower values of /(x). Then, among all subintervals created so far 
within D, an interval Dt with the maximal characteristic is chosen for further sub- 
division. It is subdivided in p subcells and /(x) is evaluated at the vertices aj, bj 
of all the intervals Dj,l < j < p. The process is repeated until satisfaction of a 
stopping rule. 

The diagonal method proposed in |[T3l [141 [161 and extending the univariate al- 
gorithm from [ 12 1 uses a global estimate of the Lipschitz constant L in its work. 
GO algorithms using in their work the global Lipschitz constant L (or its estimates) 
do not take into account local information about behavior of the objective function 
over every small subregion of D. In fact, it is supposed in such algorithms (see 
fTll ) that f{x) has the same constant L over every subdomain of D without paying 
any attention to situations where f{x) has a very low local Lipschitz constant over 
the subdomain under consideration. It has been shown for a number of global op- 
timization algorithms (see |[T8l[T9l[23l ) that using local information for estimating 
local Lipschitz constants can accelerate the global search significantly. Importance 
of such information in the diagonal approach context has been highlighted in |[T6l . 
Of course, the local data must be in an appropriate way balanced with the global 
information about the objective function otherwise the global solution can be lost 

In this paper a new diagonal algorithm generalizing an efficient deterministic 
one-dimensional GO method using local tuning on the behavior of the objective 
function (see |[T9ll ) is extended to the multi-dimensional case by the diagonal ap- 
proach using two partition strategies widely used in literature |[3] [9l [13] [T4j [161 '■ 

- Bisection, where p = 2 and the interval Dt is subdivided in two subintervals by 

a hyperplane orthogonal to the longest edge of Dt, 

- Partition 2", where p = 2" and Dt is partitioned into 2" new subintervals 

generated by the intersection of the boundary of Dt and the hyperplanes that 
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contain a point belonging to the main diagonal of Dt and are parallel to 
the boundary hypersurfaces of Dt. 

The new method uses a local information about the objective function over the 
whole search region D during the global search in contrast with techniques which 
do it only in a neighborhood of local minima after stopping their global procedures 
(see e.g. Q). Global convergence conditions are established for the new method. 
Results of a wide numerical comparison show a strong acceleration reached by the 
new method working with estimates of the local Lipschitz constants over different 
subregions of the search domain in comparison with the traditional approach using 
global estimates of L. 



2 The new algorithm with local tuning 

In this section the New Diagonal Algorithm with Local tuning (NDAL) is de- 
scribed. 

The method starts by setting the number of iterations, /, and the number of 
generated intervals, m = m{l), equal to 1. The first two trials (evaluations of the 
objective function) are executed at the points xq = a, xi = b from Q. The results 
of trials are indicated as zq = /(xq), zi = f{xi), and the initial number k = k{l) 
of trial points generated by the algorithm is taken equal to 2. The initial estimate 
of the global optimum is taken as z* = minj^o, -Zi}- The estimate Ai of the local 
Lipschitz constant over the initial interval Di = D = [a,h] (in this case, of course, 
the local estimate coincides with the global one) is calculated as follows 

, I /(a) - f{b) I 
Ai = 



II a — 6 II 

Suppose now that / > 1 iterations of the method have already been executed. 
The iteration / + 1 consists of the following steps. 

Step 1. For each interval Di = [ai, hi], 1 <i < m{l), calculate its characteristic 

Ri = 0.5{Ki II ai - bi II -/(a,) - f{bi)) (4) 

where 

C 

Ki = K,{l) = {r + j)ma.x{Xi,ji,^}, (5) 

the values r > 1, ^ > 0, and C > are parameters of the method, Aj is the 
estimate of the local Lipschitz constant over the interval Di calculated at the 
moment of creation of Di, and 

II Q» - bj II 

7^ = ^max ■ (6) 

The values /x and d'^"'^ are evaluated as follows 

u = max Xi, (7) 

l<i<m{l) 
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d'^'^^ = max II a, - h \\ . (8) 

l<i<in{l) 

Step 2. Among all the intervals Di choose an interval Dt such that 

Rt = max Ri. (9) 

l<i<m(l) 

Step 3. If 

\\ CLt — bt \\> £ \\ o- — b \\, 

where a and b are from ([3) and t is from then go to Step 4, otherwise 
take the value 

zf = min f(xi) 
i<i<Hi) 

(where Xi, 1 < i < k{l), are the trial points generated by the algorithm in 
the course of the previous / iterations) as an estimate of the global optimum 
of the problem ([B - (l3]l and Stop. 

Step 4. Choose the new point x'+^ belonging to the main diagonal (the diagonal 
joining the vertices at and bt) of the subinterval Dt, where t is from (|9j, as 
follows (see |[l3llll|l6l): 

^z+i ^ at + bt _ fjbt) - f(at) ^ bt - at 

2 2K \\at-btW 

Here 

K = k{l) = {A + j)max{fi,C}, (11) 

where ^ is from ^ and fi is from (|7]l. 

Step 5. Subdivide the interval Dt into p new subintervals by Bisection strategy or 
by Partition 2". 

Step 6. Denote by Xi, i = 1, . . . , s, the vertices of the new p subintervals gener- 
ated during Step 5 where /(x) must be evaluated. 

-In the case of Bisection strategy it is necessary to evaluate f{x) at two 
vertices, s = 2 (the points at and bt come from the subdivided interval 
Dt and f{x) has already been evaluated at its vertices during the previous 
iterations). 

-In the case of Partition 2", the number s = 2 x 2" — 3 because the new 
2" subintervals ai^e identified by their two vertices, x'^^ is common to two 
intervals, and f{at) and f{bt) of the subdivided interval Dt have akeady 
been evaluated. 
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Step 7. For all the new intervals Di, 1 < i < p, get an estimate of the local 
Lipschitz constant as 

Aj = maxj — ; — - — , max — 7-^ ; — ^ — |. (12) 

II at - bt II i<j<P II aj - bj \\ 

Set I := I + I, m := m + p — 1, k := k + s, and go to Step 1. 

Let us give a few comments on the introduced method. The key idea of the 
algorithm is estimating local Lipschitz constants by balancing local and global 
data. In contrast with the traditional approach (see llT3l IT4l ) where the global 
estimate K of the Lipschitz constant L from (|2]) is used in the form (fTTI) . the local 
estimate Ki from ([5]l is the result of the balance between the local and the global 
information represented by the values Aj and 7^, respectively. When the subinterval 
Di has a small main diagonal (in comparison with the current maximal diagonal 
^max Qygj. subintervals in D) then (see ©-dSjl), 7, is small too and the local 
information represented by Aj has a decisive influence (see ^) on Ki. When 
the interval Di is very wide (its diagonal || ai — bi || is close to d™^^), the local 
information is not reliable and the global information (see represented by 7, is 
used. 

The values r, C, and ^ influence Ki as global parameters. By increasing r and 
C we augment reliability of the method over the whole region D. The parameter 
^ > is a small number allowing the NDAL to work also when /(xj) = const for 
all trial points Xj. The importance of the parameter ^ for the correct work of the 
method can be seen from dH) - dD and ([TOl i - (fTTI) . If 7^ < ^ and Aj < ^ it follows 

K,{i) = k{i) = {T + j)i. 

Of course, this case is degenerate for the method. 

The introduced algorithm belongs to the class of diagonally extended geomet- 
ric algorithms and also to more general classes of adaptive partition and divide the 
best algorithms (see ||15][T3 and EOl . respectively). Let us study the convergence 
properties of the infinite (e = in the stopping rule) sequence {y^} of trial points 
generated by the NDAL during minimization of the function f{x) from ([B-©. 
Hereinafter we shall designate by Y' the set of limit points of the sequence {y^}. 

Theorem 1 Let y' be a limit point of the sequence {y^} then, for all trial points 
yk g {y^^, it follows f{y^) > f{y')- If there exists another limit point y" £ Y' 
then f{y') = f{y"). 

Proof. This result can be obtained as a particular case of the general convergence 
study from 1.20,1 and its proof is so omitted. □ 

The next theorem presents sufficient global convergence conditions for the 
NDAL. 
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Theorem 2 Let there exist an iteration number I* such that for a cell Dj, j = 
containing a global minimizer x* of f{x) during the l-th iteration of the NDAL the 
following inequality takes place 

Kj{l) > 2Hj, I > r, (13) 

where 

II X* — Qj II II bj — X* II 
Then, x* is a limit point of the trial sequence {y^} generated by the NDAL. 

Proof. We start the proof by showing that the estimates Ki{l) of the local Lips- 
chitz constants Lj from ([5]) are bounded values. In fact, since the global Lipschitz 
constant L < oo and the constants r > 1, C > 0, and ^ > 0, it follows 

0<r^<Ki{l)<{r + C)max{L,C}<oo, I > 1. (15) 

Suppose, that there exists a limit point y' ^ x* of the trial sequence {y^}. Taking 
into consideration dUl, (flOl ). ([TTI ). and ([T5] ) we can conclude for an interval Di, 
i = containing y' during the l-th iteration of the NDAL, that 

lim Ri{l) = -f{y'). (16) 

Z— >oo 

Consider now the cell Dj, j = such that the global minimizer x* € Dj and 
suppose that x* is not a limit point of {y^}. This signifies that there exists an 
iteration number q such that for all / > g 

x'+'^D„ j = j{l). 

Estimate now the characteristic Rj{l), I > q, of the interval Dj. It follows from 
(O and the fact of x* G Dj that 

f{aj) — f{x*) < Hj II aj — X* \\< Hj \\ aj — bj ||, 

f{bj) - f{x*) < Hj II bj -X* \\< Hj II aj - bj II . 
Then, by summarizing these inequalities we obtain 

/(a,) + f{bj) < 2f{x*) + 2H, II aj - bj \\ . 

From this inequality and ([T3] ). (fT4l) we can deduce for all iteration numbers I > I* 
that 

Rj{l) = 0.5{Kj II aj - b, II -f{aj) - f{bj)) > 
0.5{Kj II aj - bj II -2/(x*) - 2Hj \\ aj - bj \\) = 
0.5 II a, - b, II {Kj - 2Hj) - f{x*) > -f{x*). (17) 
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Since x* is a global minimizer, it follows from ([T6] l and ([17} that an iteration num- 
ber q* > max{/*, q} will exist such that 

Rj{q*) > Ri{q*)- 

But this means that during the q*-th iteration, trials will be executed at the cell Dj. 
Thus, our assumption that x* is not a limit point of {y'^} is not true and theorem 
has been proved. □ 

Let us denote the set of global minimizers of the problem ([T]!-© as X*. Then 
the following corollary ensures the inclusion Y' C X*. 

Corollary 1 Given the conditions of Theorem |2] all limit points of the sequence 
{y^} are global minimizers of f{x), Y' C X*. 

Proof. The corollary follows immediately from Theorems [T] and |2] □ 

The sets Y' and X* coincide if conditions established by Corollary |2] are ful- 
filled. 

Corollary 2 If condition (il3i is fulfilled for all points x* G X*, then the set of 
limit points of {y^} coincides with the set of global minimizers of the objective 
function f{x), i.e. Y' = X*. 

Proof. Again, the corollary is a straightforward consequence of Theorems [T]and|2l 

□ 

3 Numerical comparison 

The goal of this section is dual: first, to show advantages of the local tuning in com- 
parison to the traditional approach using global estimates of the Lipschitz constant; 
second, to establish which of two partitioning strategies. Bisection or Partition 2", 
works better. 
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Table 1 : Test problems 



N" 


Formula 


Domain 


Source 


1 


0.25rEi - O.Ski + O.lxi + 0.5x1 


[-10, 10]= 


m 


2 


(4 - 2.12:1 + xi/3)xi + X1X2 + (-4 + ■ixl)xl 


[-2.5, 2.5] X 
[-1.5, 1.5] 


El 


3 


1x\ - 1.05a:* + a;'=/6 + a:ia:2 + x\ 


[-5,5] = 


m 


4 


{X2 - 5.1a:i/(47r=) + 5xi/ir - 6)= + 10(1 - l/(87r)) cos^i + 10 


[-5, 10] X 
[0,15] 


m 


5 


(1 - 22:2 + 0.05sin(47ra:2) - xi)'^ + {x2 - 0.5 sin(27r2:i ))^ 


[-10, 10] = 


El 


6 


[1 + {xi + 2:2 + 1)=(19 - 142:1 + 3x1 - 14x2 + 62:12:2 + 32:^)] X 
[30 + (22:1 - 32:2) = (18 - 322:1 + 12xi + 48x2 - 36xiX2 + 272:3)] 


[-2,2]^ 


a 


7 


IZLl *cos((i + l)xi + i) J2^j^i 3 cos((j + l)x2 + 3) 


[-10, 10] = 


in 


8 


* cos((i + l)xi + i) Yl^^i j cos((j + 1)22 + j) + 


[-10, 10] = 


m 




(21 + 1.42513)= + (x2 + 0.80032)^ 






9 


100(X2 - xl f + {xi - i f 


[-2,8] = 




10 


(2= + X2 - 11)= + (xi + x= - 7) = 


[-6,6] = 


m 


11 


— 4xiX2 sin(47rX2) 


[0.1] = 


na 


12 


- sin(2xi + 1) - 2 sin(3x2 + 2) 


[0,1] = 


na 


13 


(xi - 2)= + (X2 - 1)= - 0.04/(0.25x1 + 2= - 1) + 5(xi - 2x2 + 1) = 


[ 1,2] = 


113 


14 


— ] sin(xi) sin(2x2) +0.01(xiX2 + (xi — 7r)^ + 3(x2 — 7r) = ) 


[ 0,27r]^ 


EH 


15 


(7r/™){10sin==(7rjyi) + ErJi'ifei - 1) = (1 + 10 sin2(,ry.+ i))] + 
iVn - 1) = }, where yt = 1 + (l/4)(xi - 1), i = 1, . . . ,n 


[-10, 10]" 


m 


16 


0.1{sin=(37rxi) + '£7=i[ixi - 1)=(1 + sin= (37rxi+i))] } + 
0.1(x„ - 1)^[1 + sin^(27rx„) ] 


[-10, 10]" 


m 


17 


-I2t=i(^i<^^Pi-T,''j = l°'ii(^i -Pijf) 


lo,if 




18 


W0[X3 - 0.25(xi + X2) = ]= + (1 - xi f + (1 - X2) = 




El 


19 


(xi — 2X2 + X3) sin(xi) sin(x2) sin(x3) 


[-1,1]' 


CD 


20 


ELi[(^i-^?)' + (^. -1)'] 


[-10, 10]^ 


EH 



Thus, four methods are compared: 

- the traditional method with Partition 2" and the global estimate; 

- the traditional method with Bisection and the global estimate; 

- the new algorithm using local tuning and Partition 2"; 

- the new algorithm using local tuning and Bisection. 

The Ust of problems used in the experiments is shown in Table [T] where the 
following quantities are specified: 

N° : problem number; 

Formula : formula of the test function; 
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Table 2: Results of numerical experiments with two-dimensional functions for r = 
1.1 



Problem 
Number 


Global Estimate 


Local Tuning 


Partition 2" 


Bisection 


Partition 2" 


Bisection 


1 


12412 


8950 


4742 


3508 


2 


8037 


2670 


2947 


1354 


3 


19427 


20392 


14832 


14244 


4 


4687 


2762 


1332 


998 


5 


4187 


2818 


807 


602 


6 


20522 


17732 


14572 


10924 


7 


6837 


4766 


5532 


3936 


8 


4057 


3922 


2822 


3372 


9 


16187 


16446 


10307 


7328 


10 


6267 


4384 


1797 


1286 


11 


312 


256 


272 


146 


12 


292 


200 


167 


96 


13 


1827 


2002 


282 


238 


14 


1127 


96* 


592 


186 


15 


4857 


2736 


2237 


1336 


16 


1627 


532 


492 


118 


Average 


7041.36 


5666.50 


3983.25 


3104.50 



Domain : feasible region of the test function; 

Source : bibliographic reference. 

Problems 1-14 are two-dimensional, problems 17-20 are three-dimensional, 
and problems 15-16 are of arbitrary dimension n > 1 (n = 2 and n = 3 have been 
used). 

To show the influence of the parameter r on the search characteristics, the ex- 
periments for the two-dimensional case have been realized for two different values 
of the parameter r in all the methods: r = 1.1 and r = 1.3. The value C = 10 was 
taken in all the two-dimensional experiments. We have executed these experiments 
with the accuracy e = 0.01 in the stopping rule. 

The numbers of function evaluations executed by the methods before satis- 
faction of the stopping rule for the two-dimensional case are reported in Tables |2] 
and [3] Global optima have been located in all the experiments. For Problem 14 
and the method with the global estimate of the Lipschitz constant and Bisection 
strategy the value r = 1.1 was too small: the method has not located the global 
minimizer in this case. The sufficient value of the reliability parameter r for finding 
the global minimizer for Problem 14 is r = 1.3. 
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Table 3: Results of numerical experiments with two-dimensional functions for r = 
1.3 



Problem 
Number 


Global Estimate 


Local Tuning 


Partition 2" 


Bisection 


Partition 2" 


Bisection 


1 


13987 


9874 


7012 


5620 


2 


9862 


4774 


3357 


2072 


3 


20057 


21608 


16802 


16754 


4 


5812 


3728 


2332 


1190 


5 


4817 


3180 


1402 


650 


6 


21922 


22424 


17812 


12622 


7 


7267 


7374 


6422 


5128 


8 


5467 


4504 


3717 


3938 


9 


16752 


17378 


10852 


8250 


10 


8852 


6820 


3432 


1858 


11 


417 


324 


362 


174 


12 


347 


232 


177 


114 


13 


2102 


2306 


307 


284 


14 


1297 


800 


747 


360 


15 


7167 


3880 


3137 


1740 


16 


1852 


778 


612 


162 


Average 


7998.56 


6874.00 


4905.13 


3807.25 



In Table |4] the experimental results for three-dimensional test functions are 
shown. The following parameters have been chosen in all the experiments: r = 
1.2, C = 100. The search accuracy e = 0.02 has been used. 

Performance of all the methods during solving Problem 10 is illustrated in 
Figs. [I]-|4] Trials points are shown by the black dots. 

The new algorithm was faster than the method using the global estimate for 
both strategies in all the cases. The smaller values of the accuracy e ensure higher 
values of acceleration. For example, Table |5] shows that the NDAL works better 
when accuracy increases and the improvement is stronger for higher values of the 
parameter r. 

It can be seen from the numerical experiments that the new method with local 
tuning significantly outperforms the traditional approach. In its turn. Bisection 
works better then Partition 2" strategy. The best combination is the new algorithm 
with local tuning working with Bisection strategy. 

Higher values of the parameter r increase the reliability of the methods and lead 
to a fast growth of the iterations number This happens because by increasing r we 
uniformly augment the estimates of the Lipschitz constants (both global and local 
ones). The obtained improvement increases for higher values of the parameter r. 
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Table 4: Results of numerical experiments with three-dimensional functions for 
r = 1.2 



Problem 
Number 


Global Estimate 


Local Tuning 


Partition 2" 


Bisection 


Partition 2" 


Bisection 


15 


173513 


43780 


98412 


12060 


16 


26938 


3732 


12625 


1032 


17 


6879 


1810 


4825 


1020 


18 


83475 


27760 


15862 


3470 


19 


8556 


2040 


7568 


1358 


20 


122436 


74254 


59646 


21756 


Average 


70299.50 


25562.67 


33156.33 


6782.67 



Table 5: Number of trials for Problem 7 in dependence on the parameter r and 

accuracy e 



r 


e 


Global Estimate 


Local Tuning 


Partition 2" 


Bisection 


Partition 2" 


Bisection 


1.1 


0.0100 


6837 


4766 


5532 


3936 


0.0010 


10742 


11664 


7012 


4662 


0.0001 


35697 


32218 


7367 


4694 


1.3 


0.0100 


7267 


7374 


6422 


5128 


0.0010 


23712 


17322 


8962 


8270 


0.0001 


54397 


42584 


11862 


8582 
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Figure 1: Level curves of Problem 10 with the trial points generated by strategy 
Partition 2" and method with global estimate of Lipschitz constant with r = 1.1, 
the number of trials = 6267 




Figure 2: Level curves of Problem 10 with the trial points generated by strategy 
Bisection and method with global estimate of Lipschitz constant with r = 1.1, the 
number of trials = 4384 
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Figure 3: Level curves of Problem 10 with the trial points generated by strat- 
egy Partition 2" and method with local tuning with r = 1.1, the number of tri- 
als = 1797 




Figure 4: Level curves of Problem 10 with the trial points generated by strategy 
Bisection and method with local tuning with r = 1.1, the number of trials = 1286 
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If in the search region there exists a neighborhood of the global solution having 
local Lipschitz constants smaller than the global one (this is true, for example, for 
differentiable functions having the global solution in an interior point of the search 
domain), then smaller values of the accuracy e ensure higher values of acceleration. 
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